Skip to content

Add timestamp rules and constraints to decoder in Whisper example#3054

Merged
greenrazer merged 3 commits into
huggingface:mainfrom
rsb-tbg:add-timestamps-to-whisper
Aug 18, 2025
Merged

Add timestamp rules and constraints to decoder in Whisper example#3054
greenrazer merged 3 commits into
huggingface:mainfrom
rsb-tbg:add-timestamps-to-whisper

Conversation

@rsb-tbg

@rsb-tbg rsb-tbg commented Aug 12, 2025

Copy link
Copy Markdown
Contributor

Summary

This PR implements proper timestamp handling rules in the Whisper decoder to ensure accurate and well-formed timestamp generation during transcription, according to: https://github.com/openai/whisper/blob/e8622f9afc4eba139bf796c210f5c01081000472/whisper/decoding.py#L439

New Example Output

cargo run --example whisper --release --features="symphonia,metal"
    Finished `release` profile [optimized] target(s) in 0.44s
     Running `target/release/examples/whisper`
No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
pcm data loaded 176000
loaded mel: [1, 80, 3000]
0.0s -- 30.0s
  0.0s-1.8s: 
  1.8s-10.4s:  and so my fellow Americans ask not what your country can do for you ask what you can do for your country

Changes Made

Core Timestamp Rules Implementation

  • Added apply_timestamp_rules() method that enforces OpenAI Whisper's timestamp constraints
  • Implemented 4 key timestamp rules:
    1. Timestamp Pairing: Timestamps must come in pairs, except directly before EOT
    2. Non-decreasing Order: Timestamps cannot decrease in value
    3. Forced Initial Timestamp: Suppress non-timestamp tokens at the beginning of transcription
    4. Probability-based Preference: When timestamp probability sum exceeds any individual token, only consider timestamps

Configuration Enhancements

  • Added max_initial_timestamp_index parameter to limit the maximum initial timestamp value
  • Changed timestamps default to true (was previously disabled by default)
  • Updated decoder constructor to accept the new timestamp constraint parameter

Code Structure Improvements

  • Refactored decoder methods to use self.model consistently instead of mutable borrows
  • Enhanced timestamp token detection and validation logic
  • Added comprehensive masking system for applying multiple timestamp constraints

Benefits

✅ More accurate timestamp generation following OpenAI Whisper specifications
✅ Prevents malformed timestamp sequences
✅ Configurable initial timestamp constraints for better control
✅ Improved transcription quality with proper temporal alignment

Testing

The implementation follows the timestamp rules from the original OpenAI Whisper codebase, ensuring compatibility and correctness.

Copilot AI review requested due to automatic review settings August 12, 2025 21:51

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements comprehensive timestamp handling rules in the Whisper decoder to ensure accurate and well-formed timestamp generation during transcription, following OpenAI Whisper's specifications.

Key Changes:

  • Added apply_timestamp_rules() method implementing 4 core timestamp constraints (pairing, non-decreasing order, forced initial timestamp, probability-based preference)
  • Added max_initial_timestamp_index parameter for configurable initial timestamp limits
  • Changed timestamps to be enabled by default and refactored decoder methods for consistent model access

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread candle-examples/examples/whisper/main.rs
Comment thread candle-examples/examples/whisper/main.rs Outdated
Comment thread candle-examples/examples/whisper/main.rs Outdated
@greenrazer greenrazer merged commit d7c5c8a into huggingface:main Aug 18, 2025
16 of 17 checks passed
@greenrazer

Copy link
Copy Markdown
Contributor

Thanks!

john-sharratt pushed a commit to john-sharratt/candle that referenced this pull request May 7, 2026
…ggingface#3054)

* Apply timestamp rules in whisper decoder and add support for maximum initial timestamp index

* Optimize mask generation in decoder by pre-allocating a reusable buffer

* Refactor timestamp probability calculations in decoder to use log-softmax for numerical stability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants